15 research outputs found

    Hierarchical Label Partitioning for Large Scale Classification

    Get PDF
    International audienceExtreme classification task where the number of classes is very large has received important focus over the last decade. Usual efficient multi-class classification approaches have not been designed to deal with such large number of classes. A particular issue in the context of large scale problems concerns the computational classification complexity : best multi-class approaches have generally a linear complexity with respect to the number of classes which does not allow these approaches to scale up. Recent works have put their focus on using hierarchical classification process in order to speed-up the classification of new instances. A priori information on labels is not always available nor useful to build hierarchical models. Finding a suitable hierarchical organization of the labels is thus a crucial issue as the accuracy of the model depends highly on the label assignment through the label tree. We propose in this work a new algorithm to build iteratively a hierarchical label structure by proposing a partitioning algorithm which optimizes simultaneously the structure in terms of classification complexity and the label partitioning problem in order to achieve high classification performances. Beginning from a flat tree structure, our algorithm selects iteratively a node to expand by adding a new level of nodes between the considered node and its children. This operation increases the speed-up of the classification process. Once the node is selected, best partitioning of the classes has to be computed. We propose to consider a measure based on the maximization of the expected loss of the sub-levels in order to minimize the global error of the structure. This choice enforces hardly separable classes to be group together in same partitions at the first levels of the tree structure and it delays errors at a deep level of the structure where there is no incidence on the accuracy of other classes

    Sequential Dynamic Classification for Large Scale Multiclass Problems

    Get PDF
    International audienceExtreme multi-class classification concerns classification problems with very large number of classes, up to several millions. Such problems have now become quite frequent in many practical applications. Until recently, most classification methods had inference complexity at least linear in the number of classes. Several directions have been recently explored for limiting this complexity, but the challenge of learning an optimal compromise between inference complexity and classification accuracy is still largely open. We propose here a novel ensemble learning approach, where classifiers are dynamically chosen among a pre-trained set of classifiers and are iteratively combined in order to achieve an efficient trade-off between inference complexity and classification accuracy. The proposed model uses statistical bounds to discard during the inference process irrelevant classes and to choose the most informative classifier with respect to the information gathered during the previous steps. Experiments on real datasets of recent challenges show that the proposed approach is able to achieve a very high classification accuracy in comparison to baselines and recent proposed approaches for similar inference time complexity

    Clinical, radiologic, pathologic, and molecular characteristics of long-term survivors of diffuse intrinsic pontine glioma (DIPG): a collaborative report from the International and European Society for Pediatric Oncology DIPG registries

    Get PDF
    Purpose Diffuse intrinsic pontine glioma (DIPG) is a brainstem malignancy with a median survival of < 1 year. The International and European Society for Pediatric Oncology DIPG Registries collaborated to compare clinical, radiologic, and histomolecular characteristics between short-term survivors (STSs) and long-term survivors (LTSs). Materials and Methods Data abstracted from registry databases included patients from North America, Australia, Germany, Austria, Switzerland, the Netherlands, Italy, France, the United Kingdom, and Croatia. Results Among 1,130 pediatric and young adults with radiographically confirmed DIPG, 122 (11%) were excluded. Of the 1,008 remaining patients, 101 (10%) were LTSs (survival ≥ 2 years). Median survival time was 11 months (interquartile range, 7.5 to 16 months), and 1-, 2-, 3-, 4-, and 5-year survival rates were 42.3% (95% CI, 38.1% to 44.1%), 9.6% (95% CI, 7.8% to 11.3%), 4.3% (95% CI, 3.2% to 5.8%), 3.2% (95% CI, 2.4% to 4.6%), and 2.2% (95% CI, 1.4% to 3.4%), respectively. LTSs, compared with STSs, more commonly presented at age < 3 or > 10 years (11% v 3% and 33% v 23%, respectively; P < .001) and with longer symptom duration ( P < .001). STSs, compared with LTSs, more commonly presented with cranial nerve palsy (83% v 73%, respectively; P = .008), ring enhancement (38% v 23%, respectively; P = .007), necrosis (42% v 26%, respectively; P = .009), and extrapontine extension (92% v 86%, respectively; P = .04). LTSs more commonly received systemic therapy at diagnosis (88% v 75% for STSs; P = .005). Biopsies and autopsies were performed in 299 patients (30%) and 77 patients (10%), respectively; 181 tumors (48%) were molecularly characterized. LTSs were more likely to harbor a HIST1H3B mutation (odds ratio, 1.28; 95% CI, 1.1 to 1.5; P = .002). Conclusion We report clinical, radiologic, and molecular factors that correlate with survival in children and young adults with DIPG, which are important for risk stratification in future clinical trials

    Very large number of classes classification study

    No full text
    La croissance des données disponibles aujourd'hui génère de nouvelles problématiques pour lesquelles l'apprentissage statistique ne possède pas de réponses adaptées. Ainsi le cadre classique de la classification qui consiste à affecter une ou plusieurs classes à une instance est étendu à des problèmes avec des milliers, voire des millions de classes différentes. Avec ces problèmes viennent de nouveaux axes de recherches comme \deleted{le temps} \added{la réduction de la compléxité} de classification qui est habituellement linéaire en fonction du nombre de classes du problème\deleted{.} \added{, ce qui est problématique lorsque le nombre de classe devient trop important.} Plusieurs familles de solutions pour cette problématique ont émergé comme la construction d'une hiérarchie de classifieurs ou bien l'adaptation de méthodes ensemblistes de type ECOC. Le travail présenté ici propose deux nouvelles méthodes pour répondre au problème de classification extrême. Le premier travail consiste en une nouvelle mesure asymétrique pour le partitionnement de classes dans le cadre d'une classification hiérarchique alors que le second axe explore l'élaboration d'un algorithme séquentiel actif d'agrégation des classifieurs les plus intéressants.The increase in volume of the data nowadays is at the origin of new problematics for which machine learning does not possess adapted answers. The usual classification task which requires to assign one or more classes to an example is extended to problems with thousands or even millions of different classes. Those problems bring new research fields like the complexity reduction of the classification process. That classification process has a complexity usually linear with the number of classes of the problem, which can be an issue if the number of classes is too large. Various ways to deal with those new problems have emerged like the construction of a hierarchy of classifiers or the adaptation of ECOC ensemble methods. The work presented here describes two new methods to answer this extreme classification task. The first one consists in a new asymmetrical measure to help the partitioning of the classes in order to build a hierarchy of classes. The second one proposes a sequential way to aggregate effectively the most interesting classifiers

    Etude de la classification dans un trés grand nombre de catégories.

    No full text
    The increase in volume of the data nowadays is at the origin of new problematics for which machine learning does not possess adapted answers. The usual classification task which requires to assign one or more classes to an example is extended to problems with thousands or even millions of different classes. Those problems bring new research fields like the complexity reduction of the classification process. That classification process has a complexity usually linear with the number of classes of the problem, which can be an issue if the number of classes is too large. Various ways to deal with those new problems have emerged like the construction of a hierarchy of classifiers or the adaptation of ECOC ensemble methods. The work presented here describes two new methods to answer this extreme classification task. The first one consists in a new asymmetrical measure to help the partitioning of the classes in order to build a hierarchy of classes. The second one proposes a sequential way to aggregate effectively the most interesting classifiers.La croissance des données disponibles aujourd'hui génère de nouvelles problématiques pour lesquelles l'apprentissage statistique ne possède pas de réponses adaptées. Ainsi le cadre classique de la classification qui consiste à affecter une ou plusieurs classes à une instance est étendu à des problèmes avec des milliers, voire des millions de classes différentes. Avec ces problèmes viennent de nouveaux axes de recherches comme \deleted{le temps} \added{la réduction de la compléxité} de classification qui est habituellement linéaire en fonction du nombre de classes du problème\deleted{.} \added{, ce qui est problématique lorsque le nombre de classe devient trop important.} Plusieurs familles de solutions pour cette problématique ont émergé comme la construction d'une hiérarchie de classifieurs ou bien l'adaptation de méthodes ensemblistes de type ECOC. Le travail présenté ici propose deux nouvelles méthodes pour répondre au problème de classification extrême. Le premier travail consiste en une nouvelle mesure asymétrique pour le partitionnement de classes dans le cadre d'une classification hiérarchique alors que le second axe explore l'élaboration d'un algorithme séquentiel actif d'agrégation des classifieurs les plus intéressants

    ATM-dependent formation of a novel chromatin compartment regulates the Response to DNA Double Strand Breaks and the biogenesis of translocations

    No full text
    This article is a preprint and has not been certified by peer reviewDNA Double-Strand Breaks (DSBs) repair is essential to safeguard genome integrity but the contribution of chromosome folding into this process remains elusive. Here we unveiled basic principles of chromosome dynamics upon DSBs in mammalian cells, controlled by key kinases from the DNA Damage Response. We report that ATM is responsible for the reinforcement of topologically associating domains (TAD) that experience a DSB. ATM further drives the formation of a new chromatin sub-compartment (“D” compartment) upon clustering of damaged TADs decorated with γH2AX and 53BP1. “D” compartment formation mostly occurs in G1, is independent of cohesin and is enhanced upon DNA-PK pharmacological inhibition. Importantly, a subset of DNA damage responsive genes that are upregulated following DSBs also physically localize in the D sub-compartment and this ensures their optimal activation, providing a function for DSB clustering in activating the DNA Damage Response. However, these DSB-induced changes in genome organization also come at the expense of an increased translocations rate, which we could also detect on cancer genomes. Overall, our work provides a function for DSB-induced compartmentalization in orchestrating the DNA Damage Response and highlights the critical impact of chromosome architecture in genomic instability

    Can Structural MRI Radiomics Predict DIPG Histone H3 Mutation and Patient Overall Survival at Diagnosis Time?

    No full text
    International audienceIdentifying tumor phenotypes non-invasively from quantitative imaging features is a challenge faced by radiomics. This study aimed at investigating if radiomic features measured at diagnosis time from conventional structural MRI can predict histone H3 mutations and overall survival of patients with diffuse intrinsic pontine glioma. To this end, 316 features from multimodal diagnostic MRI of 38 patients were extracted. Two approaches were proposed: a conventional estimation of features inside the whole region of interest and a mean estimation inside this region of local features that are computed from fixed size patches. A feature selection pipeline was then developed. Three machine learning models for H3 mutation classification and three regression models for overall survival prediction were evaluated. Leave-one-out F1-weighted scores for SVM model combining imaging and clinical features reached 0.84, showing a good prediction of H3 mutation using structural MRI. Some encouraging results were obtained to predict overall survival but they need to be reinforced on a larger number of patients

    Computation of reliable textural indices from multimodal brain MRI: suggestions based on a study of patients with diffuse intrinsic pontine glioma

    No full text
    International audienceFew methodological studies regarding widely used textural indices robustness in MRI have been reported. In this context, this study aims to propose some rules to compute reliable textural indices from multimodal 3D brain MRI. Diagnosis and post-biopsy MR scans including T1, post-contrast T1, T2 and FLAIR images from thirty children with diffuse intrinsic pontine glioma (DIPG) were considered. The hybrid white stripe method was adapted to standardize MR intensities. Sixty textural indices were then computed for each modality in different regions of interest (ROI), including tumor and white matter (WM). Three types of intensity binning were compared : constant bin width and relative bounds; constant number of bins and relative bounds; constant number of bins and absolute bounds. The impact of the volume of the region was also tested within the WM. First, the mean Hellinger distance between patient-based intensity distributions decreased by a factor greater than 10 in WM and greater than 2.5 in gray matter after standardization. Regarding the binning strategy, the ranking of patients was highly correlated for 188/240 features when comparing with , but for only 20 when comparing with , and nine when comparing with . Furthermore, when using or texture indices reflected tumor heterogeneity as assessed visually by experts. Last, 41 features presented statistically significant differences between contralateral WM regions when ROI size slightly varies across patients, and none when using ROI of the same size. For regions with similar size, 224 features were significantly different between WM and tumor. Valuable information from texture indices can be biased by methodological choices. Recommendations are to standardize intensities in MR brain volumes, to use intensity binning with constant bin width, and to define regions with the same volumes to get reliable textural indices
    corecore